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DATA PROCESSING APPARATUS AND METHOD 

Field of Invention 

The present invention relates to data processing apparatus and methods, which 
are arranged to detect code words present in material items. In some applications the 
5 code words are used to uniquely identify the material items. 

The material could be, for example, any of video, audio, audio/video material, 
software programs, digital docimients or any type of information bearing material. 
Background of the Invention 

A process in which information is embedded in material for the purpose of 

1 0 identifying the material is referred to as watermarking. 

Identification code words are applied to versions of material items for the 
purpose of identifying the version of the material item. Watemiarking can provide, 
therefore, a facility for identifying a recipient of a particular version of the material. 
As such, if the material is copied or used in a way which is inconsistent with the 

15 wishes of the distributor of the material, the distributor can identify the material 
version from the identification code word and take appropriate action. 

Co-pending UK patent appHcations with serial nimibers 0129840.5, 0129836.3, 
0129865.2, 0129907.2 and 0129841.3 provide a practical watermarking scheme in 
which a plurality of copies of material items are marked with a digital watermark 

20 formed from a code word having a predetermined number of coefficients. The 
watermarked material item is for example an image. In one example, the apparatus for 
introducing the watermark transforms the image into the Discrete Cosine Transform 
(DCT) domain. The digital watermark is formed from a set of randomly distributed 
coefficients having a normal dis;tribution. In the DCT domain each code word 

25 coefficient is added to a corresponding one of the DCT coefficients. The watermarked 
image is formed by performing an inverse DCT. 

Any watermarking scheme should be arranged to make it difficult for users 
receiving copies of the same material to collude successfully to alter or remove an 
embedded code word. A watermarking scheme should therefore with high probability 

30 identify a marked material item, which has been the subject of a collusion attack. This 



is achieved by identifying a code word recovered from the offending material. 
Conversely, there should be a low probability of not detecting a code word when a 
code word is present (false negative probability). In addition the probability of falsely 
detecting a user as guilty, when this user is not guilty, should be as low as possible 
(false positive probability). 
Summarv of Invention 

According to an aspect of the present invention there is provided a data 
processing apparatus operable to identify one of a plurality of code words present in a 
marked version of a material item. The marked version has been formed by combining 
each of a plurality of parts of a code word with one of a plurality of units from which 
the material item is comprised. The apparatus comprises a recovery processor 
operable to recover at least one part of the code word from a corresponding unit of the 
marked material item, and a correlator. The correlator is operable to generate for the 
marked material unit a dependent correlation value for the part of the code word 
recovered from the material imit and the corresponding part of at least one of the re- 
generated code words from the set. A detector is operable to determine whether at 
least one of the code words is present in the marked material item from the dependent 
correlation value for the part of the code word exceeding a predetermined threshold. 

The above mentioned co-pending UK patent applications disclose a 
watermarking system for marking material items with code words and for detecting the 
code words in suspect material items which have been used in ways which offend the 
owner or distributor of the material items. The system includes aspects, which 
facilitate an encoding process through which material is marked with code words and a 
detecting process. As explained above, the code words are detected in accordance 
with a predetermined false positive and false negative detection probability. A 
detector calculates correlation values representing the correlation of a version of the 
code word recovered from the material and each of the code words of the set re- 
generated within the detector. Code words are detected if any of the correlation values 
for the re-generated code words exceeds a threshold determined in accordance with the 
false positive and false negative detection probabilities. 



It has been discovered that one factor in reducing the probability of correctly 
detecting a code word, and hence increasing the false negative detection probability is 
presence of corruption in the marked material item. The corruption can have an effect 
that the parts of the code word recovered from imits of the material may be corrupted. 
5 The corrupted parts can have a detrimental effect on the correlation value calculated 
for a particular code word. The effect of some corrupted material imits can prevent a 
correlation value exceeding the threshold by biasing the calculation to prevent the 
correlation value exceeding the threshold for the correct code word. 

In the watermarking system disclosed in the above mentioned UK patent 

10 applications, the whole code word is recovered from the material and used to calculate 
the correlation value. Accordingly, the calculation provides a correlation value, which 
is independent of local statistical variations of the content of the material and the 
content of the parts of the code word. 

To address a problem associated with a reduction in the correlation value 

15 resulting from corrupt parts of the material, a dependent correlation value is formed. 

The dependent correlation value is formed by calculating the correlation value 
of a part of the code word only. If the dependent correlation value is sufficient to 
exceed the threshold according to the predetermined false detection probability, then a 
code word can be declared as being present. However, if the dependent correlation 

20 value is not sufficient to exceed the threshold, then the part of the code word is 
combined with a part of the code word recovered from a subsequent image and the 
dependent correlation value re-calculated. 

If the dependent correlation value for the plurahty of successive images does 
not exceed the threshold, then the parts of the code word recovered from the next 

25 plurality of successive material units may be combined and the dependent correlation 
value for these parts determined. If the threshold is exceeded then the corresponding 
code word is identified as being present. However, if the threshold is not exceeded, 
then the parts of the code word from the first plurality of images is combined with the 
parts from the second plurality of images iteratively, providing an increased code word 

30 length, the dependent correlation value being re-calculated, with a proportionally 
increased likelihood of exceeding the threshold. This process is repeated for a 
subsequent plurality of images, providing yet fiirther correlation values and increasing 




4 

the length of the part of the code word used to determine the dependent correlation 
value in a hierarchical fashion. 

If the dependent correlation value for any part of the code word includes parts, 
which have been corrupted, then the dependent correlation value produced from these 
5 parts will be unlikely to exceed the threshold. However, since other dependent 
correlation values will not include these corrupted parts, these dependent correlation 
values may exceed the threshold, whereas the independent correlation value 
determined for the whole code word may not have exceeded the threshold. This is 
because the parts of the code words from the corrupted images would be included in 
10 the calculation of the independent correlation value. 

Various fiirther aspects and features of the present invention are defined in the 
appended claims. 
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Brief Description of Drawings 

Embodiments of. the present invention will now be described by way of 
example only with reference to the accompanying drawings, where like parts are 
provided with corresponding reference numerals, and in which: 
5 Figure 1 is a schematic block diagram of an encoding image processing 

apparatus; 

Figure 2 is a schematic block diagram of a detecting image processing 
apparatus; 

Figure 3A is a representation of an original image. Figure 3B is a 
10 representation of a marked image and Figure 3C is the marked image after registration; 

Figure 4 is a graphical representation of an example correlation result for each 
of N code words in a set of code words; 

Figure 5A is a graphical representation of samples of the original image /, 
Figure 5B is a graphical representation of samples of the watermarked image W; 
15 Figure 5C is a graphical representation of correlation results for the original image and 
the watermarked image with respect to discrete sample shifts; 

Figure 6 is a schematic representation of an encoding process in which each 
part of a code word is combined with one of the images of a video sequence; 

Figure 7 is a schematic representation of a recovery decoding process in which 
20 the parts of the code word are recovered fi-om video images; 

Figure 8 is a schematic representation of a detection process embodying the 
invention in which the parts of the code word recovered from the images of Figure 7 
are used to form different correlation values in a hierarchical manner; 

Figure 9 is a graphical representation of dependent correlation values with 
25 respect to each of the hierarchical parts of the code word illustrated in Figure 9; and 

Figure 10 is a schematic block diagram of a Fourier transform correlator 
forming part of the detecting data processing apparatus shown in Figure 2. 
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Description of Preferred Embodiments " 

Watermarking System Overview 

An example embodiment of the present invention will now be described with 
reference to protecting video images. The number of users to which the video images 
5 are to be distributed determines the number of copies. To each copy an identification 
code word is added which identifies the copy assigned to one of the users. 

Video images are one example of material, which can be protected by 
embedding a digital code word. Other examples of material, which can be protected 
by embedding a code word, include software programs, digital docmnents, music, 
10 audio signals and any other information-bearing signal. 

An example of an encoding image processing apparatus, which is arranged to 
introduce an identification code word into a copy of an original image, is shown in 
Figure 1 . An original image / is received from a source and stored in a frame store 1 . 
This original image is to be reproduced as a plurality of water marked copies, each of 
15 which is marked with a uniquely identifiable code word. The original image is passed 
to a Discrete Cosine Transform (DCT) processor 2, which divides the image into 8x8 
pixel blocks and forms a DCT of each of the 8x8 pixel blocks. The DCT processor 2 
therefore forms a DCT transformed image V. 

In the following description the term "samples" will be used to refer to discrete 
20 samples from which an image (or indeed any other type of material) is comprised. The 
samples may be luminance samples of the image, which is otherwise, produced from 
the image pixels. Therefore, where appropriate the terms samples and pixels are inter- 
changeable. 

The DCT image V is fed to an encoding processor 4. The encoding processor 4 
25 also receives identification code words from an identification code word generator 8. 

The code word generator 8 is provided with a plurality of seeds, each seed 
being used to generate one of the corresponding code words. Each of the generated 
code words may be embedded in a copy of the original image to form a watennarked 
image. The code word generator 8 is provided with a pseudo random number 
30 generator. The pseudo random nxraiber generator produces the code word coefficients 
to form a particular code word. In preferred embodiments the coefficients of the code 
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words are generated in accordance with a normal distribution. However, the 
coefficients of the code word are otherwise predetermined in accordance with the seed, 
which is used to initialise the random number generator. Thus for each code word 
there is a corresponding seed which is store in a data store 12. Therefore it will be 

5 understood that to generate the code word seedj is retrieved fi*om memory 12 and 

used to initialise the random number generator within the code word generator 8. 

In the following description the DCT version of the original image is 
represented as F, where; 

F = {v.}={vj,V2,V3,V4, V^} 

10 and vf are the DCT coefficients of the image. In other embodiments the 

samples of the image v/ could represent samples of the image in the spatial domain or 
in an altemative domain. 

Each of the code words ^ comprises a plurality of n code word coefficients, 

where; 

15 X ^ j = , ^2 5 ^3 5 -^4 5 '^n } 

The number of code word coefficients n corresponds to the number or samples 
of the original image V, However, a different number of coefficients is possible, and 
will be set in dependence upon a particular application. 

A vector of code word coefficients ^ forming the z-th code word is then passed 
20 via channel 14 to the encoder 4. The encoder 4 is arranged to form a watermarked 

image W by adding the code word X}- to the image F, Effectively, therefore, as 
represented in the equation below, each of the code word coefficients is added to a 

different one of the coefficients of the image to form the watermark image W. 

25 FF'" =v, +x;,V2 +x;,V3 +x;,V4 + 

As shown in Figure 1, the watermarked images W are fomied at the output of 
the image processing apparatus by forming an inverse DCT of the image produced at 
the output of the encoding processor 4 by the inverse DCT processor 18. 

Therefore as represented in Figure 1 at the output of the encoder 4 a set of the 
30 watermarked images can be produced. For a data word of up to 20-bits, one of 10 000 
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000 code words can be selected to generate 10 million watermarked versions of the 
original image /. 

Although the code word provides the facility for uniquely identifying a marked 

copy of the image /, in other embodiments the 20-bits can provide a facility for 
5 communicating data within the image. As will be appreciated therefore, the 20-bits 
used to select the identification code word can provide - a 20-bit pay-load for 
communicating data within the image F. 

The encoding image processing apparatus which is arranged to produce the 
watermarked images shown in Figure 1 may be incorporated into a variety of products 

10 for different scenarios in which embodiments of the present invention find application. 
For example, the encoding image processing apparatus may be connected to a web site 
or web server fi-om which the watermarked images may be downloaded. Before 
downloading a copy of the image, a unique code word is introduced into the 
downloaded image, which can be used to detect the recipient of the downloaded image 

15 at some later point in time. 

In another application the encoding image processor forms part of a digital 
cinema projector in which the identification code word is added during projection of 
the image at, for example, a cinema. Thus, the code word is arranged to identify the 
projector and the cinema at which the images are being reproduced. Accordingly, the 

20 identification code word can be identified within a pirate copy produced from the 
images projected by the cinema projector in order to identify the projector and the 
cinema from which pirate copies were produced. Correspondingly, a watermarked 
image may be reproduced as a photograph or printout in which a reproduction or copy 
may be made and distributed. Generally therefore, the distribution of the watermarked 

25 images produced by the encoding image processing apparatus shown in Figure 1 is 
represented by a distribution cloud 19. 
Detecting Processor 

A detecting data processing apparatus which is arranged to detect one or more 
of the code words, which may be present in an offending marked material is shown in 
30 Figure 2. Generally, the data processing apparatus shown in Figure 2 operates to 
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identify one or more of the code words, which may be present in an offending copy of 
the material. 

The offending version of a watermarked video image W is received from a 
source and stored in a frame store 20. Also stored in the frame store 24 is the original 
5 version of the video image /, since the detection process performed by the detecting 
apparatus requires the original version of the video image. The offending watermarked 
image W and the original version of the image are then fed via connecting channels 
26, 28 to a registration processor 30. 

As already explained, the offending version of the image W may have been 
10 produced by photographing or otherwise reproducing a part of the watemiarked image 
As such, in order to improve the likelihood of detecting the identification code 
word, the registration processor 30 is arranged to substantially align the offending 
image with the original version of the image present in the data stores 20 and 24. The 
purpose of this alignment is to provide a correspondence between the original image 
15 samples / and the corresponding samples of the watermarked image W to which the 
code word coefficients have been added. 

The effects of the registration are illustrated in Figure 3A, 3B and 3C. In 
Figure 3 A an example of the original image / is shown with respect to an offending 
marked version of the image W in Figure 3B. As illustrated in Figure 3B, the 
20 watermarked image Wis offset with respect to the original image / and this may be 
due to the relative aspect view of the camera from which the offending version of the 
watermarked image was produced. 

In order to recover a representation of the code word coefficients, the correct 
samples of the original image should be subtracted from the corresponding samples of 
25 the marked offending image. To this end, the two images are aligned. As shown in 
Figure 3C, the registered image W^' has a peripheral area PA which includes parts 
which were not present in the original image. 

As will be appreciated in other embodiments, the registration processor 30 may 
not be used because the offending image W may be already substantially aligned to 
30 the originally version of the image /, such as, for example, if the offending version was 
downloaded via the Internet. Accordingly, the detecting apparatus is provided with an 
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alternative channel 32, which communicates the marked image directly to the recovery 
processor 40. 

The registered image W^'is received by a recovery processor 40. The recovery 
processor 40 also receives a copy of the original image / via a second channel 44. The 
5 registered image W and the original image / are transformed by a DCT transform 
processor 46 into the DCT domain. An estimated code word X' is then formed by 
subtracting the samples of the DCT domain marked image V Scorn the DCT domain 
samples of the original image Fas expressed by the following equations: 

10 = v; -v,,v; -V2,v; - V3,v; - ,v; - v„, 

The output of the recovery processor 40 therefore provides on a coimecting 
channel 50 an estimate of the coefficients of the code word which is to be identified. 
The recovered code word X' is then fed to a first input of a correlator 52. The 

15 correlator 52 also receives on a second input the regenerated code words X produced 
by the code word generator 54. The code word generator 54 operates in the same way 
as the code word generator 8 which produces all possible code words of the set, using 
the predetermined seeds which identify uniquely the code words fi-om a store 58. 

The correlator 52 forms n similarity sim(i) values. In one embodiment, the 

20 similarity value is produced by forming a correlation in accordance with following 
equation: 

, X' • X' • x; + • + • jc; + + xj • x'^ 

sim{i) = ; . = / . . . 

^x'x' 7^/.x;+x2'-x;+x3'-jc;+ 

Each of the n similarity values sim{i) is then fed to a detector 60. The detector 
60 then analyses the similarity values sim{i) produced for each of the n possible code 

25 words. As an example, the similarity values produced by the correlator 52 are shown 
in Figure 4 with respect to a threshold TH for each of the possible code words. As 
shown in Figure 4, two code words are above the threshold, 2001, 12345. As such, the 
detecting processor concludes that the watermarked version associated with code word 
2001 and code word 12345 must have colluded in order to form the offending image. 

30 Therefore, in accordance with a false positive detection probability, determined firom 
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the population size, which in this case is 10 milUon and the watermarking strength a , 
the height of the threshold TH can be set in order to guarantee the false detection 
probability. As in the example in Figure 4, if the correlation values produced by the 
correlator 52 exceed the threshold then, with this false positive probability, the 
5 recipients of the marked image are considered to have colluded to form the offending 

watermarked version of the image W . 
Resistration 

The process of aligning the offending marked version of the image with the 
copy of the original image comprises correlating the samples of the original image 

10 with respect to the marked image. The correlation is performed for different shifts of 
the respective samples of the images. This is illustrated in Figure 5. 

Figure 5A provides an illustration of discrete samples of the original image /, 
whereas Figure 5B provides an illustration of discrete samples of the offending 
watemiarked image W\ As illustrated in the Figures 5A and 5B, the sampling rate 

15 provides a temporal difference between samples of dt. A result of shifting each of the 
sets of samples from the images and correlating the discrete samples is illustrated in 
Figure 5C. 

As shown in Figure 5C, for a shift of between 7 and 8 samples, the correlation 
peak is highest. The offending watemiarked image is therefore shifted by this amount 
20 with respect to the original image to perform registration. 
Improved Decoding 

An explanation of an improved detecting process with respect to the general 
detecting process described above will now be described, which is also disclosed in 
co-pending UK patent applications 0129840.5. As explained above the encoding data 
25 processing apparatus is arranged to introduce a code word into a sequence of video 
images, which typically form a moving image sequence and may be for example a 
sequence of MPEG compression encoded images. According to an aspect of the 
present invention the encoder is arranged to divide the code word into a plurality of 
parts and to embed each part into a corresponding plxirality of video images. 
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An illustration of the encoding process is shown in Figure 6. As shown in 
Figure 6 parts into which a code word is divided are embedded into a plurality of 
video images Iq, I2, 13, 14, 13, Each part of the code word is embedded into a 

corresponding one of the video images. 
5 As will be explained shortly, embodiments of the present invention can provide 

an improvement in detecting code words with which a material item has been 
watermarked. For the present example the material comprises video images, which are 
suspected as having been generated from a pirated copy of a marked version of the 
original. As already explained, to accuse a recipient of the marked version, the code 

1 0 word corresponding to that recipient must be detected in the video images. 

One factor in reducing the probability of correctly detecting a code word which 
is present in a marked material item is corruption or other noise which may have been 
introduced into units which the material is comprised. A result of this corruption is to 
reduce the correlation value sim(i), as a result of including corrupted parts of the code 

1 5 word recovered from the corrupted material units in the calculation of the correlation 
value. The corrupted parts can have a detrimental affect on the correlation value sim(i) 
calculated for a particular code word. For the present example, the effect of some 
cormpted video images can prevent a sim(i) value exceeding the threshold for a code 
word which is present in the marked video material. This is because the sim(i) 

20 calculation described above provides a correlation value of the re-generated code word 
with respect to the recovered code word, and can be upset by the presence of noise or 
corruption in the suspect video images. This can have an effect of biasing the sim(i) 
calculation to prevent the correlation value exceeding the threshold for the correct 
code word. 

25 As explained above and according to the previously proposed watermarking 

system disclosed in UK patent applications 0129840.5, 0129836.3, 0129865.2, 
0129907.2 and 0129841.3, the whole code word is recovered from the video images 
and used to form the correlation value. Accordingly, the calculation of the sim(i) 
provides a correlation value, which is independent of local statistical variations of the 

30 content of the video images and the content of the parts of the code word. 

To address a problem associated with a reduction in the correlation value 
resulting from corrupt video images, preventing an otherwise present code word from 
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exceeding a correlation threshold, a dependent correlation value is formed. 
Embodiments of the present invention can provide a detecting apparatus which is 
arranged to detect the presence of a code word in a sequence of video images by 
forming a dependent correlation value from the separate parts of the code word. 
5 The dependent correlation value is formed by calculating the correlation value 

sim(i), of a part of the code word only. The part of the code word is recovered from 
one of the video images, and is correlated with a corresponding part of each of the 
code words of the set. If the dependent correlation value is sufficient to exceed the 
threshold according to the predetermined false detection probability, then a codeword 

10 can be declared as being present. However, if the dependent correlation value sim(i), 
calculated for the part of the code word recovered from a video image is not sufficient 
to exceed the threshold, then the part of the code word is combined with a part of the 
code recovered from a subsequent image in the video sequence and the dependent 
correlation value sim(i) re-calculated. 

15 The dependent correlation value is formed by combining the parts of the code 

word recovered from a plurality of successive video images and the dependent 
correlation value sim(i) re-calculated with respect to the corresponding part of each re- 
generated code word. If the dependent correlation value sim(i) for the plurality of 
successive images does not exceed the threshold, then the parts of the code word 

20 recovered from the next plurality of successive images are combined and the 
dependent correlation value for these parts determined. If the threshold is exceeded 
then the corresponding code word is identified as being present. However, if the 
threshold is not exceeded, then the parts of the code word from the first plurality of 
images are combined with the parts from the second plurality of images. For the 

25 combined parts providing an increased code word length, the dependent correlation 
value is re-calculated, with a proportionally increased likelihood of exceeding the 
threshold. This process is repeated for a subsequent pluraUty of images, providing yet 
fiirther correlation values and increasing the length of the part of the code word used to 
determine the dependent correlation value in a hierarchical fashion. 

30 If the dependent correlation value for any part of the code word includes parts, 

which have been comipted, then the dependent correlation value produced from these 
parts will not exceed the threshold. However, since other dependent correlation values 
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will not include these corrupted images, then these dependent correlation values may 
exceed th^e threshold, whereas the independent correlation value determined for the 
entire video sequence may not have exceeded the threshold. This is because the parts 
of the code words from the corrupted images would be included in the calculation of 
5 the correlation value. 

The operation of the data processing apparatus shown in Figure 2 to detect a 
code word from a dependent correlation value sim(i) will now be described with 
reference to Figures 7 and 8. 

As illustrated in Figure 7, the recovery processor 40 operates substantially as 

10 described above to generate a recovered part of the code word X' from each image of 
the suspect video sequence. Each recovered code word JO is then fed to the correlator 
52 via the first input. As explained above, the correlator 52 receives a corresponding 
part of the regenerated code words produced by the code word generator 54, and 
forms n similarity sim{i) values, one for the correlation of the recovered code word 

15 part and each of the n re-generated code word parts X^. As explained above the 
detector 60 is arranged to determine which of the dependent correlation values sim(i) 
exceeds the threshold TH determined in accordance with a desired false negative 
detection probability. However, in some embodiments the detector may identify a 
largest of the sim(i) values and only calculate subsequent dependent correlation values 

20 in order to reduce an amount of computation required to detect a code word. The 
operation of the detector 60 to detect a code word in accordance with a dependent 
sim(i) value will now be described with reference to Figure 8. 

Figure 8 provides a hierarchical representation of an arrangement for 
combining parts of recovered code words to form a dependent correlation values. 

25 Along a horizontal axis representing a first hierarchical level HLl the parts of the 
recovered code words shown in Figure 7 are presented. The correlation value for each 
of these recovered code word parts is calculated by the correlator 52, vmder the control 
of the detector 60. The sim(i) values for each video image or correspondingly each 
recovered code word part for the first hierarchical level HLl is represented graphically 

30 in Figure 9 with respect to the threshold TH. As will be seen in Figure 9, none of the 
sim(i) values calculated for the individual images exceeds the threshold TH. For this 
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reason the detector 60 proceeds to the next hierarchical level HL2 and combines parts 
of successive pairs of images to form a dependent correlation value for two successive 
images. The dependent correlation value for the second hierarchical level HL2 is 
shown plotted with the dependent correlation values sim(i) for the first level HLl in 
5 Figure 9. If none of the correlation values at the second hierarchical level HL2 
exceeds the threshold, then the detector proceeds to the third level HL3, where the 
parts of the code word formed in the second hierarchical level HL2 are combined to 
calculate dependent correlation values sim(i) for four successive images in the third 
hierarchical level HL3. 

10 As illustrated in Figure 9, the correlation value for the first set of four images 

(0, 1, 2, 3) exceeds the threshold TH. Accordingly, at this point the detector stops 
processing and declares the recipient of the video sequence corresponding to the 
detected code word as guilty. However, it will be appreciated that, if the threshold for 
a code word was not exceeded at the third hierarchical level HL3, then processing 

15 would proceed to a fourth hierarchical level HL4, where parts of the code word for 
eight successive images are combined to form a dependent correlation value, and so on 
in an iterative manner. 

Embodiments of the invention utihse a general likelihood that a quality of parts 
of recovered code words recovered from video images of a suspect video sequence are 

20 correlated. The correlation has an effect that cormpted images are more likely to occur 
together, and correspondingly good quality images are also more likely to occur 
together. As a resuU, by calculating dependent correlation values by combining code 
word parts from successive images, in iteratively increasing numbers, an improvement 
in the likelihood of correctly detecting a code word as being present is provided. The 

25 process proceeds until the dependent correlation value exceeds the determined 
threshold, thereby providing an improved likelihood of correctly detecting a given 
code word. Correspondingly the false detection probability is reduced. 
Fourier Decodins 

A correlator in accordance with an embodiment of the present invention is 
30 illustrated in Figure 10. The correlator shown in Figure 10 takes advantage of a 
technique for calculating the correlation sum sim(i) shown above. In accordance with 
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this technique the correlation sum is calculated in accordance with the following 
equation: 

F~\F(X')F(X^^^y], where F(A) is the Fourier transform of A and F'*(^)is 
the inverse Fourier transform of A. The correlator is also described in UK patent 
5 application number 0129840.5. 

The corrolator 52 shown in Figure 10 comprises a first Fourier transform 
processor 100, and a second Fourier transform processor 102. Fourier transform 
processors 100, 102 may be implemented using Fast Fourier transform algorithms. The 
second Fourier transform processor 102 also forms the complex conjugate of the 

10 Fourier transform of the regenerated code word X^. The Fourier transform of the 
recovered code word X' and the complex conjugate of the Fourier transform of the 
regenerated code word X^ are fed to first and seconds inputs of a multiplier 110. The 
multiplier 110 multiplies the respective samples firom each of the Fourier transform 
processors 100, 102 and feeds the multipUed samples to an inverse Fourier transform 

15 processor 112. At the output of the correlator an inverse Fourier transform of the 
multiplied signals samples is formed. 

As will be appreciated, the implementation of the correlator 52 shown in Figure 
10 provides an advantage in terms of time taken to compute the correlation for the n 
sample values of the regenerated code word X and the recovered code word X^. This 

20 is because the Fourier processors 100, 102, 112 can be formed fi-om FFT integrated 
circuits such as, for example, are available as ASICS. Furthermore, the inverse Fourier 
transform provided at the output of the corrolator 52 provides n similarity values sifn(i) 
corresponding to n correlation sums. However, in order to utilise the properties of the 
corrolator 52, shown in Figvire 10 the code words are arranged to be generated by 

25 cyclically shifting one code word generated X(V using a particular seed for the random 
number generator. This is illustrated below. 

As represented below, the first code word X^^^ is represented as values to 
jc„ which corresponds to the pseudo randomly produced numbers firom the code word 
generator 8. However, the second code word X^^^ is produced by performing a cyclic 

30 shift on the first code word X^^^ . Correspondingly, each of the other code words are 
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produced by correspondingly cyclically shifting further the code word X^^^ until the n- 
th code word is a code word shifted by n-1 positions. 

5 ^^"^^ ^ (^3 , 5 J -^rt 5 -^1 > -^2 ) 

-X^^"^ (-^n s -^1 » -^2 ' "^3 ' ^4 ' > ^/i-l ) 

By using this set of code words to form part of, or the whole of, the set of code 
words produced by the encoding image processor, the Fourier transform correlator 52 

10 can be used to generate in one operation all similarity values for all of the n code 
words. Therefore, as illustrated above, the corresponding shift of 1 to n of the original 
code word provides the n similarity values sim(i), and as illustrated in Figure 4, for at 
least one of the code words, a large similarity value sim(i) is produced. Therefore, as 
will be appreciated the correlator 52 only receives one regenerated code word 

15 corresponding to the first code word X^^ho form the similarity values for the set of n 
code words as illustrated in Figure 4. More details of the Fourier transform correlator 
are provided in UK Patent application number 0129840.5. 

As explained above the correlation values sim(i) are formed using the Fourier 
transform correlator 52. To form a dependent correlation value for a part of the code 

20 word, the coefficients of the other part of the code word apart from the part recovered 
from the code word are set to zero. Correspondingly, for the re-generated code word a 
part corresponding to the recovered part is reproduced and the coefficients of the 
remaining parts of the re-generated code word set to zero. Fourier transforms are then 
formed for the recovered and the re-generated parts. Altematively, instead of setting 

25 the remaining parts of the recovered and the re-generated code words to zero, the 
absent parts are simply not used to form the Fourier transform. 

As will be appreciated, instead of forming the conjugate of the Fourier 
transform of the regenerated first code word X\ the conjugate of the Fourier 
transform of the recovered code word could be formed. This is expressed by the 

30 second alternative of the Fourier transform correlator shown below: 
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Accordingly the conjugate of one of the Fourier transfonn of the recovered 
code word and the Fourier transform of the regenerated code word is formed by the 
Fourier transform processors 100, 102. 
5 Application of the Watermarking System 

As disclosed in co-pending UK patent appUcations numbered 0215495.3 and 
0215513.3, a reduced-bandwidth-version of a material item may be formed to facilitate 
secure distribution of the material item. The reduced-bandwidth-version may be 
formed by at least one of temporally or spatially sub-sampling the original material 

10 item. According to an appUcation of embodiments of the invention, the code words 
can be combined with the reduced-bandwidth-version of the original material item. 
For video material, each part of the code word is combined with a temporally or 
spatially sub-sampled video image. As explained in the above co-pending applications 
an adapted version of the original material item is formed by subtracting the reduced- 

15 bandwidth- version from a copy of the original material item. The adapted version is 
then distributed to users and the reduced-bandwidth-version provided separately. A 
version of the original is reproduced by combining the adapted version with the 
reduced-bandwidth-version, thereby introducing the code words into the reproduced 
version of the original. 

20 Various further aspects and features of the present invention are defined in the 

appended claims. Various modifications can be made to the embodiments herein 
before described without departing from the scope of the present invention. 
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CLAIMS 

1. A data processing apparatus operable to identify at least one of a 
plurality of code words, forming a code word set, present in a marked version of a 
material item, the marked version having been formed by combining each of a 

5 plurality of parts of a code word with one of a pliirality of units from which the 
material item is comprised, the apparatus comprising 

a recovery processor operable to recover at least one part of the code word 
from a corresponding unit of the marked material item, and 

a correlator operable to generate for the marked material xmit a dependent 
10 correlation value for the part of the code word recovered from the material unit and the 
corresponding part of at least one of the re-generated code words from the set, and 

a detector operable to determine whether at least one of the code words is 
present in the marked material item from the dependent correlation value for the part 
of the code word exceeding a predetermined threshold. 

15 

2. A data processing apparatus as claimed in Claim 1, wherein the detector 
is operable iu combination with the correlator to form a dependent correlation value 
for a plurality of parts of the recovered code word, and if the correlation value exceeds 
the predetermined threshold for one of the dependent correlation values, the detector is 

20 operable to identify the code word as present according to a predetermined false 
detection probability. 

3. A data processing apparatus as claimed in Claim 2, wherein the detector 
is operable in combination with the correlator to form the dependent correlation values 

25 by combining the parts of the code word recovered from successive material imits, and 
by correlating the parts formed from successive material units with corresponding part 
of the regenerated code word. 

4. A data processing apparatus as claimed in Claim 3, wherein the 
30 correlator is operable to form the dependent correlation values by combining the parts 

of the code word recovered from a first plurality of successive imits with parts of the 
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code word recovered jfrom second plurality of successive units and correlating the 
combined parts with corresponding parts of the re-generated code word. 

5. A data processing apparatus as claimed in any preceding Claim, 
5 wherein the correlator is operable under control of the detector 

to combine the parts of the code word recovered from a first plurality of 
successive imits, and to form the dependent correlation value for the combined parts, 
the detector being operable to detect the code word if the dependent correlation value 
exceeds the predetermined threshold and otherwise 

10 to combine the parts of the code word recovered from a second pluraUty of 

successive vmits, the number of units corresponding to the first plurality, and to form 
the dependent correlation value for the combined parts, the detector being operable to 
detect the code word if the dependent correlation value exceeds the predetermined 
threshold and otherwise 

15 to combine the parts of the code word recovered from the first plurality of 

successive units with parts of the code word recovered from the second plurality of 
successive units, and to form the dependent correlation value for the combined parts, 
the detector being operable to detect the code word if the dependent correlation value 
exceeds the predetermined threshold and otherwise 

20 to combine the parts of the code word recovered from a third plurality of 

successive units, and to form the dependent correlation value for the combined parts, 
the detector being operable to detect the code word if the dependent correlation value 
exceeds the predetermined threshold and otherwise 

to combine the parts of the code word recovered from a fourth pluraUty of 

25 successive units, the number of imits corresponding to the third plurality, and to form 
the dependent correlation value for the combined parts, the detector being operable to 
detect the code word if the dependent correlation value exceeds the predetermined 
threshold and otherwise 

to combine the parts of the code word recovered from the third plurality of 

30 successive units with parts of the code word recovered from the fourth plurahty of 
successive imits, and to form the dependent correlation value for the combined parts, 
the detector being operable to detect the code word if the dependent correlation value 
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exceeds the predetennined threshold and otherwise to form iteratively the first, second, 
third and fourth plurality of parts of the recovered code word, and to determine 
whether the dependent correlation value exceeds the threshold. 

5 6. A data processing apparatus as claimed in Claim 5, wherein the 

correlator is operable under control of the detector to form an iteratively increasing 
length part of the code word formed from successive material xmits and to determine 
the dependent correlation value for the increased length part of the code word, the 
iteration increasing until the whole code word is recovered and correlated with the 
10 regenerated code word, the correlation value produced being an independent 
correlation value. 

7. A data processing apparatus as claimed in any preceding Claim, 
wherein the detector and the correlator are operable in combination to form the 
15 dependent correlation value for at least one selected code word re-generated firom the 
set of code words, the code word being selected from the set in accordance with the 
relative magnitudes of the dependent correlation value formed for each code word of 
the set. 

20 8. A data processing apparatus as claimed in any preceding Claim, 

wherein the plurality of code words are formed from a first code word having a 
plurality of predetermined pseudo-randomly distributed coefficients and by generating 
other code words of the set by cycUcally shifting the first code word, and the 
correlation value is formed for a plurality of the code words by 
25 forming a Fourier transform of the recovered code word, 

forming a Fourier transform of the first code word of the set, 
forming the complex conjugate of one of the Fourier transform of the 
recovered code word and the Fourier transform of the regenerated code word, 

forming intermediate product samples by multiplying each of the Fourier 
30 transform samples of the recovered code word and the corresponding Fourier 
transform sjamples of the first code word. 
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forming correlation samples by forming an inverse transform of the 
intermediate product samples, each of the correlation value samples providing the 
correlation value for one of the set of code words, wherein the forming a Fourier 
transform of the part of the recovered code word comprises setting the remaining part 
5 of the recovered code word to zero, and forming the Fourier transform of the recovered 
code word, and 

the forming a Fourier transform of the first code word of the set comprises 
setting the remaining part of the first code word to zero, and forming the Fourier 
transform of the first code word. 

10 

9. A data processing apparatus as claimed in any preceding Claim, 
wherein the code word has been introduced into the material item in the discrete cosine 
transform domain, the apparatus comprising 

a discrete cosine transform processor operable to transform the marked 
15 material item and the original material item into the discrete cosine transform domain, 
wherein the recovery processor is operable to generate the recovered code word by 
subtracting corresponding discrete cosine transform coefficients of the original 
material version from discrete cosine transform coefficients of the marked material 
version. 

20 

10. A data processing apparatus as claimed in any preceding Claim, 
wherein the material is video material, the material units being video images. 

11. A method of identifying one of a plurality of code words present in a 
25 marked material item, the marked version having been formed by combining each of a 

plurality of parts of a code word with one of a plurality of units from which the 
material item is comprised, the method comprising 

recovering at least one part of the code word from a corresponding plurality of 
units of the marked material item, and 
30 generating for the marked material unit a dependent correlation value for the 

part of the code word recovered from the material unit and the corresponding part of at 
least one of the re-generated code words from the set, and 
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determining whether at least one of the code words is present in the marked 
material item from the dependent correlation value for the part of the code word 
exceeding a predetermined threshold. 

5 12, A method of identifying as claimed in Claim 1 1, wherein the generating 

a dependent correlation value comprises 

forming a dependent correlation value for each of a pluraUty of parts of the 
recovered code word, and if the correlation value exceeds the predetermined threshold 
for one of the dependent correlation values, 
10 identifying the code word as present according to a predetermined false 

detection probability. 

13. A method of identifying as claimed in Claim 1 2, wherein the generating 
a dependent correlation value includes forming the dependent correlation values by 

15 combining the parts of the code word recovered from successive material imits, and by 
correlating the parts formed from successive \mits with corresponding part of the 
regenerated code word. 

14. A method of identifying as claimed in Claim 13, wherein the generating 
20 a dependent correlation value includes forming the dependent correlation values by 

combining the parts of the code word recovered from a first plurality of successive 
units with parts of the code word recovered from second plurality of succesisive units 
and correlating the combined parts with corresponding parts of the re-generated code 
word. 

25 

15. A method of identifying as claimed in any of Claims 11 to 14, wherein 
the generating a dependent correlation value includes 

combining the parts of the code word recovered from a first plurality of 
successive units, 

30 forming the dependent correlation value for the combined parts, and detecting 

the code word if the dependent correlation value exceeds the predetermined threshold 
and otherwise 
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combining the parts of the code word recovered from a second plurality of 
successive units, the number of units corresponding to the first plurality, 

fomiing the dependent correlation value for the combined parts, and detecting 
the code word if the dependent correlation value exceeds the predetermined threshold 
5 and otherwise 

combining the parts of the code word recovered from the first plurality of 
successive units with parts of the code word recovered from the second plvirality of 
successive units, 

forming the dependent correlation value for the combined parts, and detecting 
10 the code word if the dependent correlation value exceeds the predetermined threshold 
and otherwise 

combining the parts of the code word recovered from a third plurality of 

successive vmits, 

forming the dependent correlation value for the combined parts, and detecting 
15 the code word if the dependent correlation value exceeds the predetermined threshold 
and otherwise 

combining the parts of the code word recovered from a fourth plurality of 
successive units, the number of units corresponding to the third plurality, 

forming the dependent correlation value for the combined parts, and detecting 
20 the code word if the dependent correlation value exceeds the predetermined threshold., 
and otherwise 

combining the parts of the code word recovered from the third plurality of 
successive imits with parts of the code word recovered from the fourth plurality of 
successive units, 

25 forming the dependent correlation value for the combined parts, and detecting 

the code word if the dependent correlation value exceeds the predetermined threshold 
and otherwise 

forming iteratively the first, second, third and fourth plurality of parts of the 
recovered code word, and 
30 determining whether the dependent correlation value exceeds the threshold. 
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16. A method of identifying as claimed in Claim 15, wherein the generating 
a dependent correlation value includes 

forming iteratively a part of the code word of increasing length, from 
successive material units, and 
5 determining the dependent correlation value for the increased length part of the 

code word, the iteration increasing until the whole code word is recovered and 
correlated with regenerated code word, the correlation value produced being a 
independent correlation value. 

10 17. An encoding data processing apparatus operable to form a marked 

version of a material item by combining each of a plurahty of parts of a code word 
with one of a plurality of units from which the material item is comprised. 

18. A computer program providing computer executable instructions, 
15 which when loaded onto a data processor configures the data processor to operate as 

the data processing apparatus according to any of Claims 1 to 10. 

19. A computer program providing computer executable instructions, 
which when loaded on to a data processor causes the data processor to perform the 

20 method according to any of Claims 1 1 to 16. 

20. A computer program product having a computer readable medium 
having recorded thereon information signals representative of the computer program 
claimed in Claim 17 or 18. 

25 

21. A data processing apparatus substantially as herein before described 
with reference to the accompanying drawings. 

22. A method of identifying at least one of a predetermined set of code 
30 words substantially as herein before described with reference to the accompanying 

drawings. 
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ABSTRACT 

DATA PROCESSING APPARATUS AND METHOD 

A data processing apparatus is operable to identify one of a plurality of code 
5 words present in a watermarked version of a material item. The marked version is 
formed by combining each of a plurality of parts of a code word with one of a plurality 
of units from which the material item is comprised. The apparatus comprises a 
recovery processor operable to recover at least one part of the code word from a 
corresponding unit of the marked material item, and a correlator. The correlator is 

10 operable to generate for the marked material unit, a dependent correlation value for the 
part of the code word recovered from the material unit and the corresponding part of at 
least one of the re-generated code words from the set. A detector is operable to 
determine whether at least one of the code words is present in the marked material 
item from the dependent correlation value for the part of the code word exceeding a 

1 5 predetermined threshold. The data processor may detect the presence of the code word 
with improved probability, in particular when parts of the material have been 
corrupted. 
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